39 research outputs found

    Relaxed Functional Dependencies - A Survey of Approaches

    Get PDF
    Recently, there has been a renovated interest in functional dependencies due to the possibility of employing them in several advanced database operations, such as data cleaning, query relaxation, record matching, and so forth. In particular, the constraints defined for canonical functional dependencies have been relaxed to capture inconsistencies in real data, patterns of semantically related data, or semantic relationships in complex data types. In this paper, we have surveyed 35 of such functional dependencies, providing a classification criteria, motivating examples, and a systematic analysis of them

    Relaxed functional dependencies: definition, discovery and applications

    Get PDF
    2016 - 2017Functional dependencies (FDs) were conceived in the early '70s, and were mainly used to verify database design and assess data quality. However, to solve several issues in emerging application domains, such as the identification of data inconsistencies, patterns of semantically related data, query rewriting, and so forth, it has been necessary to extend the FD definition... [edited by author]XVI n.s

    Incremental Discovery of Imprecise Functional Dependencies

    No full text
    Functional dependencies (FDs) are one of the metadata used to assess data quality and to perform data cleaning operations. However, in order to pursue robustness with respect to data errors, it has been necessary to devise imprecise versions of functional dependencies, yielding relaxed functional dependencies (RFDs). Among them, there exists the class of RFDs relaxing on the extent, i.e., those admitting the possibility that an FD holds on a subset of data. In the literature several algorithms to automatically discover RFDs from big data collections have been defined. They achieve good performances with respect to the inherent problem complexity. However, most of them are capable of discovering RFDs only by batch processing the entire dataset. This is not suitable in the era of big data, where the size of a database instance can grow with high-velocity, and the insertion of new data can invalidate previously holding RFDs. Thus, it is necessary to devise incremental discovery algorithms capable of updating the set of holding RFDs upon data insertions, without processing the entire dataset. To this end, in this paper we propose an incremental discovery algorithm for RFDs relaxing on the extent. It manages the validation of candidate RFDs and the generation of possibly new RFD candidates upon the insertion of the new tuples, while limiting the size of the overall search space. Experimental results show that the proposed algorithm achieves extremely good performances on real-world datasets

    Learning Effective Query Management Strategies from Big Data

    No full text
    The availability of big data collections, together with powerful hardware and software mechanisms to process them, gives nowadays the possibility to learn useful insights from data, which can be exploited for multiple purposes, including marketing, fault prevention, and so forth. However, it is also possible to learn important metadata that can suggest how data should be manipulated in several advanced operations. In this paper, we show the potentiality of learning from data by focusing on the problem of relaxing the results of database queries, that is, trying to return some approximated answer to a query when a result for it is unavailable in the database, and the system will return an empty answer set, or even worse, erroneous mismatch results. In particular, we introduce a novel approach to rewrite queries that are in disjunctive normal form and contain a mixture of discrete and continuous attributes. The approach preprocesses data collections to discover the implicit relationships that exist among the various domain attributes, and then uses this knowledge to rewrite the constraints from the failing query. In a first step, the approach tries to learn a set of functional dependencies from the data, which are ranked according to special mechanisms that will successively allow to predict the order in which the extracted dependencies have to be used to properly rewrite the failing query. An experimental evaluation of the approach on three real data sets shows its effectiveness in terms of robustness and coverage

    Synchronization of queries and views upon schema evolutions: A survey

    No full text
    One of the problems arising upon the evolution of a database schema is that some queries and views defined on the previous schema version might no longer work properly. Thus, evolving a database schema entails the redefinition of queries and views to adapt them to the new schema. Although this problem has been mainly raised in the context of traditional information systems, solutions to it are also advocated in other database-related areas, such as Data Integration, Web Data Integration, and Data Warehouses. The problem is a critical one, since industrial organizations often need to adapt their databases and data warehouses to frequent changes in the real world. In this article, we provide a survey of existing approaches and tools to the problem of adapting queries and views upon a database schema evolution; we also propose a classification framework to enable a uniform comparison method among many heterogeneous approaches and tools

    Visual Data Integration based on Description Logic Reasoning

    No full text
    Despite many innovative systems supporting the data integration process, designers advocate more abstract metaphors to master the inherent complexity of this activity. In fact, the visual notations provided in many modern data integration systems might run into scale up problems when facing the integration of big data sources. Thus, higher level visual notations and automatic schema mapping mechanisms might be the key factors to make the data integration process more tractable. In this paper we present the Conceptual Data Integration Language (CoDIL), a visual language providing conceptual level visual mechanisms to manipulate and integrate data sources, together with a formalization of the language icon operators by means of ALCN Description Logic. The formalization allowed us to dene the logic-level semantics of CoDIL, providing reasoning rules for validating the correctness of a data integration process and for generating the logic-level reconciled schema

    Dependency-based Query Result Approximation

    No full text
    Failing queries are database queries returning few o no results. It might be useful reformulating them in order to retrieve results that are close to those intended with original queries. In this paper, we introduce an approach for rewriting failing queries that are in the disjunctive normal form. In particular, the approach prescribes to replace some of the attributes of the failing queries with attributes semantically related to them by means of Relaxed Functional Dependencies (rfds), which can be automatically discovered from data. The semantics of automatically discovered rfds allow us to rank them in a way to provide an application order during the query rewriting process. Experiments show that such application order of rfds yields a ranking of the approximate query answers meeting the expectations of the user
    corecore